Estimating the causal effects of multiple intermittent treatments with application to COVID-19

To draw real-world evidence about the comparative effectiveness of multiple time-varying treatments on patient survival, we develop a joint marginal structural survival model and a novel weighting strategy to account for time-varying confounding and censoring. Our methods formulate complex longitudinal treatments with multiple start/stop switches as the recurrent events with discontinuous intervals of treatment eligibility. We derive the weights in continuous time to handle a complex longitudinal dataset without the need to discretize or artificially align the measurement times. We further use machine learning models designed for censored survival data with time-varying covariates and the kernel function estimator of the baseline intensity to efficiently estimate the continuous-time weights. Our simulations demonstrate that the proposed methods provide better bias reduction and nominal coverage probability when analyzing observational longitudinal survival data with irregularly spaced time intervals, compared to conventional methods that require aligned measurement time points. We apply the proposed methods to a large-scale COVID-19 dataset to estimate the causal effects of several COVID-19 treatments on the composite of in-hospital mortality and ICU admission.


Introduction
The COVID-19 pandemic has been a rapidly evolving crisis challenging global health and economies. Public health experts believe that this pandemic has no true precedent in modern times (Oh, 2020). While multiple COVID-19 vaccines have been developed across the globe, consensus on optimal hospital management of COVID-19 has not been achieved (Yousefi et al., 2020). In general, antiviral medications have been utilized to reduce direct pathogenic effects of viral invasion and replication, whereas immunomodulatory agents have been utilized to reduce pathogenic inflammation that has a negative impact on end-organ function. Emerging evidence from randomized controlled trials indicates that treatment modality effectiveness varies by infection severity and patient functional status. For example, antivirals offer enhanced protection from mortality when administered near symptom onset (Wang et al., 2020a), whereas immunomodulatory therapeutics such as glucocorticoids (RECOVERY Collaborative Group, 2021) or interleukin-6 receptor antagonists (Horby et al., 2021) yield enhanced patient benefits when given in the context of severe hypoxemia requiring hospitalization and/or mechanical ventilation. Several putative therapeutics for COVID-19 have also been evaluated within randomized controlled trials.
REMAP-CAP is a randomized, embedded, multi-factorial adaptive platform trial investigating the causal effects of multiple COVID-19 treatments alone and in combination. Evidence arising from the REMAP-CAP study has provided strong evidence base for the effect of anticoagulant medications in the context of infection severity, lack of efficacy of several existing therapeutic agents (e.g. hydroxychloroquine, azithromycin, lopinavir/ritonavir, Anakinra, and therapeutic anticoagulation in critically ill patients), and long-term survival benefits from the use of antiplatelets and interleukin-6 receptor antagonists (Florescu et al., 2023). REMAP-CAP also generated insufficient evidence of therapeutic benefit from hydrocortisone, and stronger evidence for benefits of therapeutic heparin in moderate severity infection. Further high qual-ity evidence arising from the RECOVERY trial (RECOVERY, 2021) has demonstrated the therapeutic futility of aspirin, lopinavir/ritonavir, colchicine, and convalescent plasma. In contrast, dexamethasone, baricitinib, monoclonal antibodies, and interleukin-6 receptor antagonists, when used in conjunction with corticosteroids, have been shown to reduce the risk of mortality and/or morbidity associated with COVID-19 (WHO Solidarity Trial Consortium and others, 2022). The IDSA Guidelines on the Treatment and Management of Patients with COVID-19 (Bhimraj et al., 2022) provide further details of evidence-based indications and contraindications for the aforementioned therapeutics.
Although randomized controlled trials (RCTs) are considered the gold standard for establishing causal evidence of therapeutic efficacy, they are enormously expensive and time consuming, especially in a time of crisis. Stringent inclusion and exclusion criteria also limit the generalizability of RCTs to frailer populations at higher risk for severe morbidity and mortality. To overcome these challenges, we study the causal effects of COVID-19 treatments on patient survival by leveraging the continuously growing observational data collected at the Mount Sinai Health System-New York City's largest academic medical system. Treatment modalities offered at the Mount Sinai Health System during the study period aligned with "IDSA Guidelines on the Treatment and Management of Patients with COVID-19" (Bhimraj et al., 2022) and were updated as revisions were released. We focus on four commonly used medication classes that are of most clinical interest: (i) remdesivir; (ii) dexamethasone; (iii) anti-inflammatory medications other than corticosteroids; and (iv) corticosteroids other than dexamethasone.
The complex nature of COVID-19 treatments, owing to differential physician preferences and variability of treatment choices attributable to evolving clinical guidelines, poses three major challenges for statistical analysis of nonexperimental longitudinal data with censored survival outcomes. First, treatment is not randomly allocated and the treatment status over time may depend upon the evolving patient-and disease-specific covariates. Second, the measurement time points during the follow-up are irregularly spaced. Third, there is more than one treatment under consideration. Patients can be simultaneously prescribed to various treatment combinations, or can be switched to a different treatment. Figure 1 illustrates the observed treatment trajectories for nine randomly selected patients during their hospital stays. Thus, the primary issue of nonrandom treatment allocation, when combined with irregularly spaced measurement time points and multiple time-varying treatments, leads to unique analytical challenges that necessitate sophisticated longitudinal causal inference approaches. Colors indicate on treatment. Lack of color corresponds to off treatment.
While previous work has shown that a continuous-time marginal structural model is effec-tive in addressing time-varying confounding and provides consistent causal effect estimators (Johnson and Tsiatis, 2005;Saarela and Liu, 2016;Hu et al., 2018;Hu and Hogan, 2019;Ryalen et al., 2020;Johnson et al., 2022), the development has been restricted to a single longitudinal treatment and therefore may not be directly applicable. We consider a joint marginal structural model to accommodate multiple longitudinal treatments in continuous time. The joint effects of multiple exposures on an outcome are of strong interest in epidemiologic research (Rothman et al., 2008;Howe et al., 2012). Hernán et al. (2001) and Howe et al. (2012) presented methods in the discrete time settings for estimating the joint effects of multiple timevarying exposures in HIV research. In COVID related research, there is a great need for estimating the joint or marginal effects of multiple COVID specific medications. There remains ambiguity as to which medication works best for COVID infected patients, particularly among those admitted into the hospital in earlier phases of the pandemic. In the absence of treatment guidelines, numerous medications and combinations of the medications had been used to treat COVID-19. As a result, a large repertoire of real-world clinical encounter data was generated.
A comprehensive examination of the joint or marginal effects of multiple treatments can be conducted by leveraging these datasets with a high level of granularity. This is made possible by virtue of a continuous-time joint marginal structural model and the estimation methods we develop for such a model. Specifically in this article, we consider a joint marginal structural model to estimate the causal effects of multiple longitudinal treatments in continuous time. To estimate causal parameters in the joint marginal structural model, we derive a novel set of continuous-time stabilized inverse probability weights by casting each treatment process as a counting process for recurrent events, allowing for discontinuous intervals of eligibility. In addition, we propose to use machine learning and smoothing techniques designed for censored survival data to estimate such complex weights. Through simulations, we demonstrate that our approach provides valid causal effect estimates and can considerably alleviate the consequence of unstable inverse probability weights under parametric formulations. We further undertake a detailed analysis of a large longitudinal registry data of clinical management and patient outcomes to investigate the comparative effectiveness of multiple COVID-19 treatments on patient survival. This work makes three major contributions to the causal inference literature. First, we offer a practicable framework to jointly estimate the causal effects of multiple time-varying treatments in continuous time. While methods for estimating the effect of one treatment at a time are relatively well-developed, methods for studying multiple longitudinal treatments are relatively scarce. Second, we develop a novel weighting scheme to deal with time-varying confounding on the continuous-time scale without the need to discretize the time and "bin" the data. Third, we strategically recast the problem of "start/stop" treatment switches as recurrent events, making it possible to draw inferences about highly complex time-varying treatments that are sparsely observed along the time continuum. Importantly, these three methodological contributions operate in concert under a single modeling framework. Thus, we provide a valuable analysis apparatus for datasets possessing highly complex structures, which are seen increasingly often as electronic health records data become increasingly accessible.

Notation and set up
We consider a longitudinal observational study with multiple treatments and a right-censored survival outcome. Denote t as the time elapsed from study entry (time 0), which is hospital admission in our COVID data example, and denote t o as the maximum follow-up time. In our COVID data example, the administrative censoring date was February 26, 2021, which defines the maximum follow-up time. Suppose each individual has a p-dimensional covariate process {L(t) : 0 ≤ t ≤ t o }, some elements of which are time-varying; by definition, the time-fixed elements of L(t) are constant over [0, t o ]. Let T denote time to an outcome event of interest such as death, with {N T (t) : 0 ≤ t ≤ t o } as its associated zero-one counting process. We consider W different medication classes (treatments), whose separate and joint causal effects on patient survival are of interest. We use A w (t) to denote the assignment of treatment w ∈ W = {1, . . . , W }, which can be characterized by a counting process; we let A w (t) = 1 if an individual is treated with w at time t and A w (t) = 0 otherwise (Johnson and Tsiatis, 2005). We refer to Figure 1 for a visualization of treatment processes, in which colors indicate A w (t) = 1 and lack of color corresponds to A w (t) = 0. Due to the intricate nature of the treatment processes, we defer an exhaustive exposition on recasting the treatment process within the recurrent event framework to Section 3.1.
Let C denote the time to censoring due to, for example, discharge or loss to follow up. We use the overbar notation to represent the history of a random variable, for example,Ā w (t) = {A w (s) : 0 ≤ s ≤ t} corresponds to the history of treatment A w from hospital admission up to and including time t andL(t) = {L(s) : 0 ≤ s ≤ t} corresponds to the covariate history up to and including time t. Following the convention in the longitudinal causal inference literature , we assume treatment decision is made after observing the most recent covariate information just prior to the treatment; that is, A w (t) occurs after L(t − ) for all w.
The observed data available for drawing inferences about the distribution of potential outcomes are as follows: the observed time to outcome event is T * = T ∧ C, with the censoring indicator ∆ T = I(T ≤ C). Note that both treatment processes {A w (t), w = 1, . . . , W } and the covariate processL(t) are defined for t ∈ [0, t o ] but are observed only at some and potentially irregularly spaced time points for each individual. For example, individual i may have covariates and treatment status observed at a set of unique, discrete time points from study entry t = 0 to his or her last follow-up time t K i ≤ t o .

Joint marginal structural model for survival outcomes
We consider a marginal structural model to estimate the joint causal effects of multiple timevarying treatments on patient survival. The most popular model specification is a marginal structural Cox regression model, for its flexibility in handling baseline hazard and straightforward software implementation when used in conjunction with the stabilized inverse probability weights (Hernán et al., 2001;Howe et al., 2012). When there is a strong concern that the proportional hazards assumption may not be satisfied across the marginal distribution of the counterfactual survival times, alternative strategies such as the structural additive hazards model can also be considered. Other modeling frameworks such as the structural nested models (Robins, 1992) can potentially be used to investigate the causal effect of a time-varying treatment on survival outcomes. For example, Lok (2008) presented a conceptual framework, as well as a mathematical formalization, for the application of structural nested models in drawing causal inferences about time-varying treatments in the presence of time-dependent confounding on a continuous-time scale (Robins, 1992;Mark and Robins, 1993). Lok (2017) further extended the theory of continuous-time structural nested models without assuming a deterministic (or constant) treatment effect. More recently, Yang et al. (2020) developed a doubly robust estimator for the continuous-time structural failure time models. However, the capacity of these methods to accommodate multiple longitudinal treatments, each characterized by multiple "start/stop" switches, has yet to be elucidated.
As a starting point, we present our methodology based on the marginal structural Cox regression model to address the aforementioned multiple analytical challenges (see Figure 1), which are increasingly common with large-scale electronic health records data. However, we acknowledge that extensions based on alternative modeling frameworks such as those introduced in (Young et al., 2020) may also be possible to to overcome these challenges. Despite our focus on continuous-time marginal structural Cox model, we hope this work will open new vistas for vibrant research on this topic.
For notational brevity but without loss of generality, we first consider W = 2 treatments.
Expansion of the joint marginal structural model and weighting scheme for W ≥ 3 treatments is discussed in Section 3.3. Specifically, we assume Tā 1 (t),ā 2 (t) follows a marginal structural proportional hazards model of the form where λ Tā 1 (t),ā 2 (t) (t) is the hazard function for Tā 1 (t),ā 2 (t) and λ 0 (t) is the unspecified baseline hazard function when treatment A 1 (t) and A 2 (t) are withheld during the study. The structural parameter ψ 1 encodes the instantaneous effect of treatment A 1 (t) (when treatment 1 is administered at time t) on Tā 1 (t),ā 2 (t) in terms of log hazard ratio while A 2 (t) is withheld during the study. Similarly, the structural parameter ψ 2 corresponds to the instantaneous treatment effect for A 2 (t) in the absence of A 1 (t). The instantaneous multiplicative interaction effect of A 1 (t) and A 2 (t) is captured by the structural parameter ψ 3 , allowing the possibility that the instantaneous effect of A 1 (t) depends on A 2 (t). In addition, model (1) can be elaborated by letting λ Tā 1 (t),ā 2 (t) (t) depend on baseline covariates or by using a stratified version of λ 0 (t), with straightforward adaptations to the weighted estimating equations (introduced in Section 3) required to identify the structural model parameters. Model (1) implicitly assumes that the instantaneous treatment effect is constant across the follow-up. This model assumption is reasonable given that the COVID-related hospitalization is generally short and medications are prescribed for days in succession. Finally, model (1) is a continuous-time generalization of the discrete-time model considered by Howe et al. (2012) for estimating the joint survival effects of multiple time-varying treatments.
Model (1) offers two features for the estimation of treatment effects. First, the counterfactual survival function can be expressed as Therefore, causal contrasts can be performed based on any relevant summary measures of the counterfactual survival curves such as median survival times or restricted mean survival times.
In Section 5, we demonstrate the causal comparative effectiveness of COVID-19 treatments using counterfactual survival probablity at 14 days following hospital admission and 14-day restricted mean survival time. Second, model (1) allows for estimating the causal effects of static treatment regimens of complex forms as observed in clinical settings. For instance, an intervention could be represented asā 1 (6 days) = 1 6×1 , denoting the prescription of treatment 1 for a duration of 6 days. A more complex example is {ā 1 (12 days),ā 2 (12 days)} = {(1 6×1 , 0 6×1 ), (0 6×1 , 1 6×1 )}, which indicates the assignment of treatment 1 for an initial 6-day period, followed by a complete switch to treatment 2 for an additional 6 days. Finally, we note that the focus of this study is treatment assignment in a population of COVID-19 patients who were admitted to the hospital following a confirmed positive diagnosis. Given the nature of the hospital and disease setting, compliance to the assigned treatment is not a major concern within our study. In the presence of noncompliance, however, our causal effects should be interpreted in an intention-to-treat manner.

Estimating Structural Parameters in Continuous Time
To obtain a consistent estimator for ψ = {ψ 1 , ψ 2 , ψ 3 } in model (1) using longitudinal observational data with two treatments, we introduce the following causal assumptions and maintain them throughout the rest of the article: (A1) Consistency. The observed failure times, Similarly for the observed censoring times, The consistency assumption implies that the observed outcome corresponds to the counterfactual outcome under a specific joint treatment trajectory {ā 1 (t),ā 2 (t)} when an individual actually follows treatment {ā 1 (t),ā 2 (t)}. This is an extension of the consistency assumption developed with a single time-varying treatment (Robins, 1999;Wen et al., 2021) to two timevarying treatments.
(A2) Conditional Exchangeability. Alternatively referred to as sequential randomization, this assumption states that the initiation of treatment at time t among those who are still alive and remain in the study is conditionally independent of the counterfactual survival time Tā 1 (t),ā 2 (t) conditional on observed treatment and covariate histories (Robins et al., 2000). Mathematically, where ρ A 1 ,A 2 (t) is the joint intensity of the joint counting processes defined by A 1 (t) and A 2 (t).
In a similar manner, we maintain the assumption of conditional exchangeability for censoring.
The assumption entails that ∀t ∈ [0, t o ], Our conditional exchangeability assumption is a continuous-time generalization of the usual sequential randomization assumption for the discrete-time marginal structural models (Robins, 1999;Howe et al., 2012).
(A3) Positivity. We assume that at any given time t, there is a positive probability of initiating a treatment, among those who are eligible for initiating at least one treatment, for all configurationsŌ(t − ): For a pair of joint treatments (A 1 , A 2 ), at a given time t, individuals with treatment status (0, 0), (0, 1) or (1, 0) are "eligible for" initiating at least one treatment. Individuals with treatment status A 1 (t) = 1 and A 2 (t) = 1 are by nature not eligible for initiating either A 1 or A 2 at time t (see our recurrent event framework in Section 3.1). For this reason, we only need to assume the positivity assumption when the individual is off that specific treatment, i.e., at risk for initiating that treatment. To refrain from adding complexity to the already intricate methodology, we do not consider treatment discontinuation (from 1 to 0) as a stochastic process. This decision also is justified clinically because COVID medication is typically prescribed for a specific duration, e.g., administer dexamethasone at a doseage of 6 mg once daily for a duration of 10 days (RECOVERY Collaborative Group, 2021). In Section 5.2, we examine and discuss the validity of these structural (nonparametric) assumptions, along with their implications, within the context of our COVID-19 data application.

Framing repeated treatment initiation as recurrent events
As depicted in Figure 1, the observed treatment patterns for COVID-19 demonstrate a high degree of complexity, owing to considerable variations in established treatment protocols and the heterogeneity in clinician preferences during the course of the pandemic. Individuals might discontinue a particular treatment and subsequently resume it at a later point, or they could be transitioned entirely to an alternative treatment. Patients may also receive multiple treatments simultaneously for a specific duration. From the counting process perspective, each treatment can be viewed as a recurrent event process, with discontinuous intervals of treatment eligibility (Andersen and Gill, 1982). Specifically, reframing treatment initiation as recurrent events can effectively capture two key aspects of our observational data: (i) once a patient receives a treatment, they cannot be prescribed the same treatment again while they are still on that treatment; and (ii) after the patient is off the treatment, they become eligible, or at risk for re-initiating the treatment. Note that the off-treatment period is represented by the lack-of-color period for a treatment in Figure 1.
To formalize the treatment initiation process, we first consider a univariate treatment process N A w . We assume that jumps in A w (t), i.e., dA w (t), are observed on certain subintervals of [0, t o ] only. Specifically for individual i, we observe the stochastic process A w,i (t) on a set of intervals Implications of this representation concerning treatment initiation timing are provided in Supplementary Section 3.
Define a censoring or filtering process by D A w i (t) = I(t ∈ E w,i ), and the filtered counting process by where q indexes the qth treatment initiation. We assume conditional independence among occurrences of treatment initiation given all observed history (Andersen and Gill, 1982), and that the set E w,i is defined such that D A w i (t) is predictable. The assumption of conditional independence is considered plausible in the context of COVID-19 treatment, as the associations between treatment initiations for each patient are more likely driven by key time-varying covariates (e.g., oxygen levels). Consequently, by conditioning on these time-varying covariates, the time intervals between treatment initiations become conditionally uncorrelated, supporting this conditional independence assumption for this particular application. The observed data with occurrences on the set E w,i can therefore be viewed as a marked point process generating the filtration (F D w,t ) (Andersen et al., 1993). Similarly, we denote the filtration generated by the counting process We assume A w,iq (t) follows Aalen's multiplicative intensity model (Aalen et al., 2008), with respect to the filtration (F w,t ). In this model, the intensity process of A w,iq (t) is denoted as ρ w,iq (t, θ). The person-specific initiation intensity, parameterized by θ, is represented by α iq (t, θ) = α 0 (t)IR(θ,L i (t)), where α 0 (t) corresponds to a common baseline intensity.
Furthermore, the intensity ratio function for treatment initiation, which depends on the personspecific history, is given by IR(θ,L i (t)). The at-risk function, Y w,iq (t), is defined as follows: Y w,iq (t) = 1 indicates that person i is eligible for the qth initiation of treatment w just before time t within the interval [t, t + dt), while Y w,iq (t) = 0 implies ineligibility. This model assumes that the intensity of initiating treatment w at time t can be decomposed into a product of the person-specific intensity function and the at-risk process. Within the context of COVID-19 treatment, the intensity function represents the propensity for receiving treatment w, attributable to (time-fixed and time-varying) factors such as disease aggressiveness and genetic predisposition. Concurrently, the at-risk process represents patients at time t who have not yet received and are eligible for treatment w. It follows that the filtered counting process N A w iq (t) follows the multiplicative intensity model with respect to (F D w,t ) (Andersen et al., 1993).
With two treatments, model (4) can be directly extended for the joint treatment initiation process as where Y A 1 ,A 2 iq (t) is the at-risk process for the qth treatment initiation with the filtering process jointly defined by A 1 and A 2 .

Derivation of the continuous-time weights
We first consider the case without right censoring. Under assumptions (A1)-(A3), a consistent estimator of ψ can be obtained by solving the weighted partial score equation (Johnson and Tsiatis, 2005;Hu et al., 2018), where Ω A 1 ,A 2 (t K i ) is the weight that corrects for time-varying confounding for time-varying is a modified version of the weighted mean of Z over observations still at risk for the outcome event at time t (R T t is the risk set at time t). In equation (7), we define the weighted at-risk indicator for outcome Y * T is the at-risk function for the outcome event, and r(a 1 , a 2 , t) = exp{ψ 1 a 1 (t)+ψ 2 (t)a 2 (t)+ψ 3 a 1 (t)a 2 (t)}. In Supplementary Section 4, we provide a heuristic justification for the consistency of the weighted estimating equations approach based on the use of Radon-Nikodym derivative (Murphy et al., 2001;Hu et al., 2018).
In the discrete-time setting with non-recurrent treatment initiation, the stabilized inverse probability weights (we suppress subscript i for brevity) are given in the prior literature (Hernán et al., 2001;Howe et al., 2012): where t k 's are a set of ordered discrete time points common to all individuals satisfying 0 = (8)  We now generalize the discrete-time weights (8) to the continuous-time setting so that the methods can be applied to data with irregularly spaced time intervals without artificially aligning time points. We first partition the time interval [0, t] into a number of small time intervals, Recall that treatment initiation, or the jumps of A w (t), dA w (t), is observed on a number of only. That is, conditional on historyL(s), the occurrence of treatment initiation for an individual in [s, s + ds)I(s ∈ E) is a Bernoulli trial with outcomes dA w (s) = 1 and dA w (s) = 0. Then the term, which takes the form of the individual partial likelihood for the filtered counting process When the number of time intervals in [0, t] increases and ds approaches zero, the finite product over the number of time intervals of the individual partial likelihood will approach a product integral (Aalen et al., 2008), that is, (9) then provides a basis for generalizing the discrete-time weights (8) to the continuous-time setting.
The first quantity in (9) is equal to the finite product over the jump times and the second quantity is the survival function for treatment initiation. Thus, the continuous-time weight that corrects for potential selection bias associated with A w is the product of the density function f A w and survival function S A w of the filtered counting process for treatment A w . In alignment with conventions established in prior literature (Hernán et al., 2001;Howe et al., 2012), the treatment weight for joint treatments A 1 and A 2 assumes a treatment order. By positing that treatment A 1 is administered infinitesimally earlier than treatment A 2 , the intensity of initiating treatment A 2 at time t can be dependent on the status of treatment A 1 at time t, A 1 (t), and the status of treatment A 2 at time t − , A 2 (t − ). Furthermore, the intensity of initiating treatment A 1 at time t can rely on both A 1 (t − ) and A 2 (t − ). This assumption of ordered treatment administration will depend on specific clinical contexts. In Section 5.4, we carried out a sensitivity analysis to examine the impact of varying treatment order assumptions on the estimation of causal effects. Finally, based on (9), the continuous-time stabilized weight for each individual is explicitly obtained as Supplementary Section 5 provides additional discussions of the weight formulation in connection to recurrent treatment initiations.
Turning to censoring, under the conditional exchangeability assumption (A2), the censoring process is covariate-and treatment-dependent. To correct for selection bias due to censoring, we additionally define a weight function associated with censoring, where S C is the survival function associated with the censoring process, and This leads to a modification of the estimating equation for ψ, whereZ * * = k∈R T

Extensions to more than two time-varying treatments
While we present our methodology using two longitudinal treatments, our approach can be readily extended to accommodate multiple time-varying treatments in a straightforward manner. Theoretically, a fully interacted version of Model (1) can be developed to encompass all main effects of a w (t) for all w ∈ W, along with their respective interactions. Considerations of clinical relevance and data sparsity for treatment combinations may further inform the inclusion of interaction terms within the structural model.
where Z(A 1i , . . . , A W i , t) is a vector of length W + M representing observed time-varying treatments A w (t) and multiplicative terms of those treatments whose causal interaction effects are of interest, andZ * * is evaluated using the weighted risk set Y * * T . The estimation of joint treatment weights, denoted by Ω A 1 ,...,A W (t K i ), can be achieved by assuming predetermined sequence in which treatments are administered, as outlined in Section 3.2. The estimation of censoring weights Ω C (G i ) adheres to the same methodology described in Section 3.2, except that the survival function in the denominator of Ω C (G i ) is replaced by

Estimation of the causal survival effects
We consider four ways in which the continuous-time treatment weights can be estimated: (i) fitting a usual Cox regression model for the intensity process of the counting process of treat- estimating the density function f A w and survival function S A w from the fitted model with the Nelson-Aalen estimator for the baseline intensity, and finally calculating the weight Ω A 1 ,A 2 (t K i ) for each individual; (ii) smoothing the Nelson-Aalen estimator and in turn f A w and S A w estimated from the fitted Cox regression model by means of kernel functions (Ramlau-Hansen, 1983), and calculating the weights using the smoothed version of f A w and S A w ; (iii) fitting a multiplicative intensity tree-based model (Yao et al., 2022) in which the functional form of the intensity ratio for treatment initiation is flexibly captured to estimate the f A w and S A w and calculate the weights; (iv) smoothing the Nelson-Aalen estimator of the baseline intensity from the tree-based model and calculating the weights using the smoothed version of f A w and S A w . Among these approaches, (i) relies on the parametric assumptions about the intensity ratio relationships between the treatment initiation process and covariate process and may be subject to model misspecification and bias for estimating causal effects. Compared to the Nelson-Aalen estimator which includes discrete jumps at event occurrences, the kernel function estimator in (ii) may help alleviate the issue of extreme or spiky weights, and has also been shown to be a consistent and asymptotically normal baseline intensity estimator (Andersen et al., 1993). Approach (iii) leverages a recent random survival forests model (Yao et al., 2022) that can accommodate time-varying covariates to mitigate the parametric assumptions and attendant biases associated with the usual Cox regression. In the context of baseline time-fixed treatments, previous research has used similar machine learning techniques to improve propensity score weighting estimators for both a point treatment (Lee et al., 2010;Chang et al., 2022) and a time-varying treatment (Shen et al., 2017). Additionally, these machine learning methods have been utilized to yield more accurate causal effect estimates in the presence of censored survival data (Hu et al., 2021(Hu et al., , 2022a. Finally, approach (iv) smooths the baseline intensity estimated from the survival forests for estimating the stabilized inverse probability weights, and serves as an additional step to smooth over the potentially spiky weights. In Section 4, we compare the performances of these four strategies to estimating the continuous-time weights to generate practical recommendations. In addition, the censoring weight function Ω C (G i ) can be estimated in a similar fashion via any one of these four approaches. Additional details of kernel function smoothing in approach (ii) and random survival forests in approach (iii) are presented in Supplementary Section 1.
To accommodate the time-varying covariate process and account for the recurrent nature of treatment initiation, we fit a survival model to the counting process-stylized data. Each individual is represented by several rows of data corresponding to nonoverlapping time intervals of the form (start, stop]. To allow for discontinuous intervals of eligibility, when defining multiple when the individual is currently being treated and therefore no longer eligible for initiating the treatment. Finally, since our estimators for ψ is a solution to the weighted partial score equation (11), we can use the robust sandwich variance estimator to construct confidence intervals for the structural parameters; the details of the robust sandwich variance estimator is provided in Supplementary Section 1. Alternatively, nonparametric bootstrap can be used to construct confidence intervals that can take into account of the uncertainty of the continuous-time weight estimation. In practice, the robust sandwich variance estimator has been shown to be at most conservative under the discrete-time setting (Shu et al., 2021), and we will empirically assess the accuracy of this variance estimator with continuous-time weights via simulations.

Simulation design
We carry out simulations to investigate the finite-sample properties of the proposed weighting estimators for the marginal structure Cox model parameters. We simulate data compatible with the marginal structural Cox model by generating and relating data adhering to the structural nested accelerated failure time (SNAFT) model (Young et al., 2008). A representation of a SNAFT model for time-varying treatment a is (Hernán et al., 2005) where T0 is the counterfactual failure time under no treatment. This version of SNAFT assumes that both the left and right sides of the equation follow the same distribution. Robins (1992) developed a simulation algorithm to generate data adhering to the SNAFT model under the discrete-time version of the identifying assumptions (A1)-(A3) in Section 3. Young et al. (2008) showed that, under the same identifying assumptions, data adhering to a marginal structural Cox model of the form λ Tā (t) = λ 0 (t) exp [ψ msm a(t)] can be simulated from a SNAFT model with ψ aft = ψ msm by adding an additional quantity to the term exp [ψ aft a(t)]. In particular, when T0 has an exponential distribution, the additional quantity is zero, hence the structural nested AFT model simulation algorithm (Robins, 1992) can be used to appropriately simulate data compatible with the marginal structural Cox model under complex time-varying data structures. Building on these previous works, we extend the simulation algorithm described in Karim et al. (2017) to generate data from the joint marginal structural Cox model, while allowing for multiple time-varying treatments with discontinuous intervals of treatment eligibility and for both continuous and discrete time-varying confounders.
Throughout we simulate an observational study with n = 1000 patients and two timevarying treatments A 1 (t) and A 2 (t). We assumeL(t) is appropriately summarized by a contin-uous time-varying confounding variable L 1 (t) and a binary time-varying confounding variable L 2 (t). The simulation algorithm includes two steps. In step (1), we consider nonlinear terms for the continuous variable L 1 (t k ) and the interaction term Therefore, L(t k ) is a time-dependent confounder affecting both the future treatment choices and counterfactual survival outcomes. The simulation of treatment initiation is placed in the recurrent event framework. Once treatment is initiated at time t k , treatment duration following initiation is simulated from a zero-truncated Poisson distribution. In step (1), we generate a longitudinal data set with 100 × 1000 observations (100 aligned measurement time points for each of n = 1000 individuals). In step (2), we randomly discard a proportion of follow-up observations for a randomly selected subset of individuals (Lin et al., 2004); and in the resulting data set, the individuals will have varying number of follow-up measurement time points, which are also irregularly spaced. Supplementary Section 2 provides the full pseudo-code for simulating data under the marginal structural Cox model with two time-varying treatments. To evaluate the performance characteristics of the proposed method with respect to sample size, we additionally examine five smaller sample sizes, n = 250, n = 350, n = 500, n = 650 and n = 850, all featuring 100 follow-up time points for each individual.
Our simulation parameters are chosen so that the simulated data possess similar characteristics to those observed in the motivating COVID-19 data set. The treatments A 1 and A 2 are simulated to resemble dexamethasone and remdesivir such that: (i) about 20% patients did not take any of the anti-viral and anti-inflammatory medications aimed at treating COVID-19; (ii) among those who were treated, 62% took dexamethasone only, 25% took remdesivir only and 13% took both (either concurrently or with treatment switching); (iii) the number of initiations for both treatments ranges from 0 to 4 with the average medication duration about 5 days. The values of treatment effect parameters ψ 1 and ψ 2 were set to yield a 6.7% mortality proportion among those who received dexamethasone and a 4.9% mortality proportion in those treated with remedesivir.

Comparison of methods
We conduct two sets of simulations to investigate the finite-sample performance of our proposed joint marginal structural survival model in continuous time (JMSSM-CT). First, we compare how accurate the four weighting estimators described in Section 3.4 estimate the structural parameter ψ. For ease of comparison, in this simulation we consider the structural parameter ψ as the target of inference. Because the counterfactual survival functions directly depend on the structural parameter ψ, unbiased estimation of ψ will lead to improved estimation of counterfactual survival functions or any summaries of these counterfactual quantities (such as median counterfactual survival time or restricted mean survival time). Second, we use the best weighting estimator, suggested by the first set of simulation, for JMSSM-CT, and compare it with the joint marginal structural model that requires aligned discrete time points (JMSM-DT). To ensure an objective comparison, we use the random forests (Yao et al., 2022) and adapt it into our proposed recurrent event framework to estimate the weights for JMSM-DT. In addition, we implement both JMSSM-CT and JMSM-DT on the "rectangular" simulation data with 100 aligned time points for each individual and on the "ragged" data with irregular observational time points. The performance on the rectangular data will be considered as the benchmark performance, based on which we will assess the relative accuracy of JMSSM-CT and JMSM-DT when estimating the structural parameters with the "ragged" data. Approach (ii) uses kernel function smoothing of the Nelson-Aalen estimator in approach (i).

Approach (iii) uses a survival forests model that accommodates time-varying covariates and
Nelson-Aalen estimator for baseline intensity. Approach (iv) uses kernel function smoothing of the Nelson-Aalen estimator in approach (iii).

Performance characteristics
To assess the performance of each method, we simulate 250 observational data sets using the above approach, and evaluate the absolute bias, root mean squared error (RMSE) and covarage probability (CP) for estimating the ψ. The CP is evaluated on normality-based confidence intervals with the robust sandwich variance estimator. Additionally, we empirically evaluate the convergence rate of the proposed method by measuring how rapidly the RMSE decreases as the sample size increases (Hu et al., 2020).
In Figure 2, we compared the biases in estimating ψ 1 and ψ 2 for a sample size of n = 1000 by employing the four weight estimation approaches. The weighting estimator (iv) using both the flexible tree-based survival model and kernel function estimator of the treatment initiation intensity yielded the lowest biases in estimating both ψ 1 and ψ 2 . By contrast, the weighting estimator (i) via the usual main-effects Cox regression model, susceptible to model misspecification, along with the Nelson-Aalen baseline intensity estimator produced the largest estimation bias. Applying the kernel function smoothing to the Nelson-Aalen estimator led to bias reduction for both the Cox (approach (ii)) and tree-based survival model (approach (iv)) for the treatment process. Flexible modeling of the intensity ratio function has a larger effect in reducing the bias in structural parameter estimates than smoothing the nonparametric baseline intensity estimator. For example, compared to approach (ii), approach (iii) further reduced the mean absolute bias (MAB) in estimatingψ 1 by approximately 67%. Supplementary Table 1 summarizes the MAB, RMSE and CP for the four weighting estimators and similarly suggests that approach (iv) led to the smallest MAB and RSME, and provided close to nominal CP with the robust sandwich variance estimator.   Figure 1; patients could be simultaneously taking two or more treatment classes, or they could switch from one treatment class to another during their hospital stays. Treatment alterations or discontinuations may be influenced by indications of therapeutic failure, such as deteriorating oxygen saturation levels, adverse side effects like bleeding or disseminated intravascular coagulation, or evidence of therapeutic effectiveness, such as improved oxygen saturation levels.
Following suggestions by our clinician investigators, we assumed that the following time-  Table   4). The time-varying confounding variables were D-dimer levels, serum creatinine level and patient oxygen levels.
The average age of this sample population is 64.6 with a standard deviation of 18.1. About 54% of the patients were male and 46% female. The Hispanics accounted for about 26% of the patient population and the racial composition is 29% Whites, 25% Blacks, 6% Asians and 40% Other. Summary statistics for time-fixed confounders are presented in Supplementary   Table 5. Time-varying confounders were measured repeatedly over the course of hospital stay. We considered a composite outcome, ICU admission or in-hospital death, whichever occurs first. The outcome may be right censored by hospital discharge or administratively censored on t o = February 26, 2021, the date on which the database for the current analysis was locked.

Marginal structural modeling, assumptions and estimands
We implemented all four approaches discussed in Section 3. Besides the aforementioned structural assumptions, the fitted JMSSM-CT model itself includes proportional hazards as a modeling assumption. Therefore, our primary analysis should be interpreted under the condition that the marginal structural Cox model is correctly specified.
When this condition holds, the structural regression parameter represents a causal hazard ratio, as this is precisely the case where the hazard ratio parameter can be written as a ratio between two log counterfactual survival functions. However, we acknowledge that more generally a time-varying hazard ratio parameter lacks causal interpretation due to built-in selection bias issues; see Hernán (2010)

Results
Using the stabilized inverse probability weights to correct for time-varying confounding and censoring, the structural model parameter estimatesψ (log hazard ratio) and the associated 95% confidence intervals are provided in Table 2. Echoing the findings from our simulation study (Section 4), the weighting estimator (iv), using the random survival forests model and kernel function smoothing of the Nelson-Aalen estimator, produced the narrowest confidence intervals. By contrast, the weighting estimator (i), using the main-effects Cox regression model and non-smoothed Nelson-Aalen estimator, led to the widest confidence intervals. As a result, using the weighting estimator (iv), we observe a statistically significant treatment benefit (in log hazard ratio) with dexamethasone (−.2(−.35, −.06)) and remedesivir (−.53(−.75, −.31)), and added treatment benefit if remedesivir is used in combination with corticosteroids other than dexamethasone. Using the weighting estimator (i), none of the main or interactive treatment effects appeared to be statistically significant.  To obtain further insights into the operating characteristics of each method, we summarize the distribution of the estimated individual weights in Table 3. As one can clearly see, the weighting estimator (i) produced a substantial amount of extreme weights -the minimum of .0001 and maximum of 63. By comparison, the estimator (iv) generated no spiky weights, with the mean value of close to one. There is little difference in the weight distribution between estimator (ii) and estimator (iii), both of which mitigated the issue of extreme weights, but not to the same degree as the estimator (iv). Figure 4 shows the side-by-side comparison of the time-varying weights at 7, 14, 21 and 28 days since hospital admission estimated using the four weighting estimators. The weighting estimator (iv) produced no extreme weights at any of the time points. An increasing amount of extreme weights was generated when the modeling flexibility decreased or when the baseline intensity estimator was not smoothed.    Table 7, which show a high degree of similarity to the results displayed in Table 4 using our marginal structural portional hazards model.

Sensitivity analysis to assess the impact of treatment ordering
When multiple time-varying treatments are under investigation, the analyst may face the question of the optimal treatment ordering by which the joint treatment weights Ω A 1 ,...,A W (t K i ) (Section 3.3) can be estimated. In our simulation and case study, we used the default decreasing order of "treatment sizes". Note that the "treatment sizes" considered here do not refer to the sizes of mutually exclusive treatment groups, but simply the number of patients who have received the treatments at some point during the hospital stay. In our COVID-19 dataset, 7830 patients took dexamethasone (A 1 ) at some points in time following hospital admission, 4943 took corticosteroids other than dexamethasone (A 2 ), 4103 received remedesivir (A 3 ), and 2844 were treated with anti-inflammatory other than corticosteroid (A 4 ). We estimated the joint treatment weights by positing that A 1 is administered infinitesimally earlier, followed se- quentially by A 2 , A 3 and A 4 . This implies that upon estimating the joint treatment weights, the intensity of initiating treatment A 1 at time t can depend on A 1 (t − ), A 2 (t − ), A 3 (t − ) and A 4 (t − ). By comparison, the intensity of initiating treatment A w can be conditional on A w (t − ) and A 1 (t), . . . , A w−1 (t), for 2 ≤ w ≤ 4. Since the proposed method is inspired by large-scale electronic health records data on COVID-19, it is well-suited for sizable longitudinal datasets. Optimal performance can be achieved with a sample size of at least 1000.
We acknowledge several limitations of our study that could merit future investigations.
First, we have primarily focused on developing and implementing the continuous-time weighting scheme to address confounding bias with complex observational data. Due to the complexity in the construction of weights, we have only empirically explored convergence rates in simulations. A more in-depth theoretical investigation into the convergence rates of continuous-time weighted JMSSM is warranted in future research, particularly when the weights are estimated by machine learning algorithms. A promising direction to pursue is to explore the possibility of deriving an efficient influence function for our JMSSM (Hines et al., 2022), which has been demonstrated as a vehicle to integrate machine learners for nuisance functions such that the final estimator achieves the standard root-n parametric rate for causal inference (Chernozhukov et al., 2018). These existing results, however, are typically based on a time-fixed treatment, and an extension of those theoretical results to time-varying treatments and continuous-time data would be particularly instrumental and can suggest ways to further improve our proposed estimators. Second, our developments have not addressed the potential challenge arising from competing events. Due to lack of information on specific causes of death in the motivating COVID-19 dataset, we have considered the time to in-hospital death (overall death) as a composite outcome of interest and answered the question on comparative effectiveness on overall mortality. When scientific interest lies in the counterfactual cumulative incidence on patient death due to a specific cause, our JMSSM may not be directly applicable without modifications. One potential approach is to consider a structural model based on the cumulative incidence, mimicking the existing development with a time-fixed treatment (Ozenne et al., 2020).
Assuming a time-fixed treatment and discrete-time survival data, Young et al. (2020) provided an inspiring discussion on different versions of causal estimands with competing risks data, and provided recipes for estimation based on g-formula and inverse probability weighting. It would also be useful to investigate the extensions of those results to time-varying treatments and possibly continuous-time survival data. Third, our analysis of COVID-19 dataset has assumed a correctly specified marginal structural proportional hazards model (Hernán et al., 2001). Under the proportional hazards assumption, the structural regression parameters can be causally interpreted as they can be derived as contrasts defined on log counterfactual survival functions (Martinussen et al., 2020). In more general settings without proportional hazards, however, the hazard ratio parameters are challenging to interpret (Hernán, 2010). To partially account for such challenges in interpretation, we have also considered the structural model as where ρ 0 (t) is the baseline intensity rate function, IR L (t), θ > 0 is the (time-dependent) intensity ratio function parameterized by θ (note that IR L (t), θ can be nonparametrically estimated via machine learning techniques such as the random survival forests(?)), and Y (t) is the at risk indicator. The cumulative baseline intensity is P 0 (t) = t 0 ρ 0 (s)ds. The kernel function estimator for ρ 0 (t) is derived by smoothing the increments of the Nelson-Aalen estimator ofP 0 asρ where K is the kernel function, which is a bounded function on [−1, 1] and has integral 1, and the bandwidth b is a positive parameter governing the amount of smoothness. Commonly used kernel functions include the standard normal density function K(x) = ϕ(x) and the Epanechnikov kernel function K(x) = 0.75(1 − x 2 ), |x| ≤ 1. ? shows that the kernel function estimatorλ 0 (t) is consistent and asymptotically normal provided that there exists a sequence of positive constants {a n }, increasing to infinity as n → ∞, and that the bandwidth tends to zero more slowly than a −2 n as n → ∞. Placed in the framework of recurrent event for a treatment A w that can have multiple "start/stop" switches as described in Section 3.1, the intensity of the qth treatment initiation ρ A w q (t) is smoothed by means of kernel function estimator of the baseline intensity function ρ A w 0q (t).
In our simulations (Section 5) and COVID-19 case study (Section 6), we used both the standard normal density ϕ(x) and Epanechnikov kernel functions and they yielded similar results. We presented results from the normal density kernel function. Following ?, we chose the optimal bandwidth b as the value that minimizes the mean integrated squred error (MISE)

Random survival forests accommodating time-varying covariates
Approach (iii) uses a recent random survival forests model (?) that accommodates time-varying covariates to reduce the parametric assumptions about the form of the intensity ratio function required by the usual proportional intensity regression model. ? proposed a forest method that estimates the survival function by three steps. In step (1), we reformat the data in the counting process structure, that is, for individual i, the time-varying covariate L (i) (t) will be represented as . Pool the counting process styled records from N subjects to create a list of pseudo-subjects, (1) The set of pseudo-subjects is treated as if they were independent.
Step (2) is to apply the forest algorithms on the reformatted dataset given in (1), to fit a model. In step (3), given a particular stream of covariate values at the corresponding time values, a survival function estimate is constructed based on the outputs of the forest algorithms.
We now briefly describe the forest algorithms. ? extended the relative risk forests, which combines the relative risk trees (?) with random forest methodology (?), for LTRC data by modifying the splitting criteria. The splitting criterion under the relative risk framework is to maximize the reduction in the one-step deviance between the log-likelihood of the saturated model and the maximized log-likelihood. Let R h denote the set of observations that fall into the node h. Let P 0 index the baseline cumulative intensity function, φ h represent the nonnegative relative risk of the node h, and t l and δ l be the time and event indicator of the lth observation ∀l ∈ R h . Given the right censored observations (t l , δ l ), the full likelihood deviance residual for node h is defined as Two steps are needed to modify the splitting rule and obtain the deviance residual appropriate for LTRC data (?). First, compute the estimated cumulative intensity functionP 0 (·) based on all pseudo-subject observations. Second, replaceP 0 (t l ) in (2) withP 0 (t ′ l+1 ) −P 0 (t ′ l ), and replace δ l in (2) with δ ′ l . We refer to ? for more detailed description of the random survival forests model.

Variance estimation
Since our estimators for ψ (marginal structural proportional hazards model parameter) can be considered as solution to the weighted partial score equation, we consider the robust sandwich variance estimator as a convenience device to construct confidence intervals. The robust sand-wich variance estimator has been considered, for example, in ?, and ??, and has been shown to be at most conservative under the discrete-time setting. We use this estimator for inference in conjunction with our continuous-time stabilized inverse probability weights, and formally evaluate its performance in our simulations. We briefly describe the robust sandwich variance estimator below. With the continuous-time weights, we focus on equation (11) in the main text with two treatments: where Ω A 1 ,A 2 Ω C (G i ) is the weight for time-varying treatments A 1 and A 2 and censoring (in In the above definition, R T t refers to the collection of subjects still at risk for the outcome event at time t, Y * * T is the at-risk function for the outcome event, and r(a 1 , a 2 , t) = exp{ψ 1 a 1 (t) + ψ 2 (t)a 2 (t) + ψ 3 a 1 (t)a 2 (t)}. Now further define with a ⊗2 = aa ⊤ for any vector a. Then the robust sandwich variance estimator takes the form and u; ψ)du is the martingale based on the counting process for outcome event. The sandwich variance estimator is obtained when both Σ 0 and Σ 1 are evaluated at the estimated weights andψ.
Because the above robust variance estimator considers the weights Ω A 1 ,A 2 Ω C (G i ) as fixed known values (?), it could result in conservative (but still valid) inference. With time-invariant weights estimated by logistic regression in the cross-sectional treatment setting, the corrected robust sandwich variance estimator has been derived to achieve improved variance estimation for hazard ratio parameters (?). However, an extension to continuous-time weights is not trivial and currently unavailable. Alternatively, resampling method such as the bootstrap method (??) could be used to make more robust inference for ψ that accounts for the uncertainty of the weights.

Additional details of recurrent events formulation
To formalize the treatment initiation process, we first consider a univariate treatment process N A w . We assume that the jumps of A w (t), i.e., dA w (t), is observed on certain subintervals of [0, t o ] only. Specifically for individual i, we observe the stochastic process A w,i (t) on a set of intervals individual i has J i treatment initiations. A special case where J i = 1 and U w,iJ i = t w,iK i corresponds to the situation where individual i is continuously eligible for treatment initiation and has not been treated with w during the follow-up. Second, once treatment is initiated, is no longer stochastic until person i discontinues the treatment. This also suggests that the jth treatment initiation is observed at U w,ij . This implication pertains to the "off" treatment period (referred to as the lack-of-color period in Figure 1), during which a patient becomes eligible to receive the treatment. Third, we have In words, treatment status is equal to one deterministically on the discontinuous intervals of ineligibility (i.e., on treatment period). This implication involves the "on" treatment period (represented by the colored segment in Figure 1). Once the treatment commences at a specific time point, the patient continues with the treatment for a predetermined duration, throughout which the treatment status remains deterministically at one.

A justification for consistency using the Radon-Nikodym derivative
This section offers a heuristic rationale for the treatment weights employed in fitting our structural proportional hazards model. We employ arguments analogous to those presented by ? and ?, who utilized a Radon-Nikodym (R-N) derivative developed by ? to construct and estimate weights that lead to consistent estimates of structural parameters in the context of censored survival outcomes and continuous outcomes, respectively. For brevity, we demonstrate the use of R-N assuming no censoring. For the scenario involving right censoring, the reasoning follows a similar approach.
Under randomization of treatment, the unbiased partial likelihood score equations can be written as n i D i (ψ) = 0 (??), where . Now we consider the scenario in which treatment is nonrandomly allocated. Let P R (·) denote the data distribution under randomized treatment, and let P O (·) denote the same under non-random allocation of treatment. The observable data, with respect to treatment and outcome, for each individual, under either randomized or non-randomized allocation of treatment, is {Ā 1 (T * ),Ā 2 (T * ), T * , ∆ T }. Following ?, under the conditional exchangeability assumption (A2) and some regularity conditions, including the Positivity assumption (A3), the distribution of {Ā 1 (T * ),Ā 2 (T * ), T * , ∆ T } under P R (·) is absolutely continuous with respect to the distribution of {Ā 1 (T * ),Ā 2 (T * ), T * , ∆ T } under P O (·), and a version of the R-N derivative is where f A 1 ,A 2 (·) is the joint probability density according to which treatments A 1 and A 2 are randomly allocated, and f A 1 ,A 2 (· | ·) is the conditional joint probability density.
An estimating equation that is a function of observed data and is unbiased under the distribution of P R (·) can be re-weighted by the R-N derivative to obtain an unbiased estimating equation using the same observed data, but now under the distribution P O (·) (?). We apply the R-N derivative in (3) to construct an unbiased estimating equation using the observed data under the distribution P O (·), by noticing that the partial likelihood estimating equation n i D i (ψ) can be represented as three averages. We first rewrite D i (ψ) as Under randomized assignment, and Each of the three limits in (4), (5) and (6) is an expected value, taken over the distribution of treatment, applied to observable data under randomization. We can apply the R-N derivative in (3) to (4), (5) and (6), and obtain the weights for both the risk set and the score contribution.
The weighted weighted partial score equation is where the general form of the weight is The continuous-time is a generalization of the discrete-time weight. For example, in the discrete-time setting with (two) nonrecurrent treatments, the stabilized inverse probability weights (we suppress subscript i for brevity) are given in the prior literature (??): where t k 's are a set of ordered discrete time points common to all individuals satisfying 0 = t 0 < t 1 < t 2 < . . . ≤ t. In Section 3.2, we generalize these weights to the continuous-time setting, and further take into account the recurrent nature of the treatments.

Precise formulation of individual weights
When the number of time intervals in [0, t] increases and ds approaches zero, the finite product over the number of time intervals of the individual partial likelihood will approach a product integral (?). Therefore, as we discuss in the main manuscript, each product term in the numerator and denominator of (7) can be generalized to the continuous-time setting as where ∆A w (t) = A w (t)−A w (t − ). For individual i, both factors in (8) need to be evaluated with respect to the individual's filtered counting process {N A w iq (t) : 0 ≤ t ≤ t K i , q = 1, . . . , Q w,i }, where Q w,i is the number of initiations of treatment w for individual i.
As described in Supplementary Section 3, the number of treatment initiations for individual i, Q w,i can take three values: (i) Q w,i = 0, (ii) Q w,i = J i − 1 or (iii) Q w,i = J i . Corresponding to the three cases, the quantity in (8) can be rewritten in explicit forms as where S A w and f A w are the survival and density function of the filtered counting process for treatment A w . In alignment with conventions established in prior literature (??), the exposure weight for joint treatments A 1 and A 2 assumes a treatment order. By positing that treatment A 1 is administered infinitesimally earlier than treatment A 2 , the intensity of initiating treatment A 2 at time t can be dependent on the status of treatment A 1 at time t, A 1 (t), and the status of treatment A 2 at time t − , A 2 (t − ). Furthermore, the intensity of initiating treatment A 1 at time t can rely on A 1 (t − ) and A 2 (t − ). This assumption of ordered treatment administration is plausible in clinical contexts.
For exposition brevity, we definē Putting this all together, the individual continuous-time stabilized inverse probability weight that corrects for time-varying confounding is given by Ω A 1 ,A 2 (t) = Ω A 1 (t)Ω A 2 (t) with Ω A w being: Figure 2: Approximate convergence rates of root-mean-square errors for estimating structural parameters ψ 1 and ψ 2 utilizing the four weighting estimators detailed in Section 3.4.

Patient oxygen levels
We describe how patient oxygen levels are categorized based on the use of ventilator in Table 4.  Abbreviations: SD = standard deviation; Treatment ordersψ (95% Confidence Interval)